An auditory-based measure for improved phone segment concatenation

نویسندگان

David T. Chappell

John H. L. Hansen

چکیده

This paper describes a new auditory-based distance measure intended for use in a concatenated synthesis technique wherein timeand frequency-domain characteristics are used to perform natural-sounding speaker synthesis. Whereas most concatenation systems use large databases (often +100,000 units), we begin from a small, limited database (approx. 400 units) and use a new spectral distortion measure to aid in the selection of phones for optimal concatenation. At the transition between speech segments, the new auditory-based distance metric assesses perceived discontinuities in the frequency domain. The distortion measure, which employs the Carney auditory model, is used to select phones which minimize the perceived distortion between concatenated segments. Moreover, timeand frequency-domain methods can shape the prosodic and spectral characteristics of each speech segment. The nal results demonstrate improved performance over standard concatenation methods applied to small databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Organizing phone models based on piecewise linear segment lattices of speech samples

Aiming at robust speech recognition, we have proposed a framework for “phonological concept formation,” which is the task of acquiring an efficient representation of phonemes from spoken word samples without using any transcriptions except for the lexical classification of the words. In order to implement this task, we propose the “piecewise linear segment lattice (PLSL)” model for phoneme repr...

متن کامل

Generalized phone modeling based on piecewise linear segment lattice

The goal of this work is to model phone-like units automatically from spoken word samples without using any transcriptions except for the lexical identi cation of the words. In order to implement this task, we have proposed the \piecewise linear segment lattice (PLSL)" model for phoneme representation. The structure of this model is a lattice of segments, each of which is represented as regress...

متن کامل

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

There are many scenarios in both speech synthesis and coding in which adjacent time-frames of speech are spectrally discontinuous. This paper addresses the topic of improving concatenative speech synthesis with a limited database by proposing methods to smooth, adjust, or interpolate the spectral transitions between speech segments. The objective is to produce natural-sounding speech via segmen...

متن کامل

High-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion

Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpos...

متن کامل

Automatically Creating a Diphone Set from a Speech Database

This paper presents a measure that scores various aspects of phone quality. The measure is designed to penalize phone instances with one or several characteristics that are not desirable in concatenation-based speech synthesis. Depending on the phone type, these aspects amongst others include spectrum, phase, fundamental frequency, duration, voicing and plosive quality. We applied this quality ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

An auditory-based measure for improved phone segment concatenation

نویسندگان

چکیده

منابع مشابه

Organizing phone models based on piecewise linear segment lattices of speech samples

Generalized phone modeling based on piecewise linear segment lattice

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

High-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion

Automatically Creating a Diphone Set from a Speech Database

عنوان ژورنال:

اشتراک گذاری